Fast Most Similar Neighbor (MSN) classifiers for Mixed Data
نویسنده
چکیده
The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast kNN classifiers have been developed; most of them rely on metric properties (usually the triangle inequality) to reduce the number of prototype comparisons. Hence, the existing fast k-NN classifiers are applicable only when the comparison function is a metric (commonly for numerical data). However, in some sciences such as Medicine, Geology, Sociology, etc., the prototypes are usually described by qualitative and quantitative features (mixed data). In these cases, the comparison function does not necessarily satisfy metric properties. For this reason, it is important to develop fast k most similar neighbor (k-MSN) classifiers for mixed data, which use non metric comparisons functions. In this thesis, four fast k-MSN classifiers, following the most successful approaches, are proposed. The experiments over different datasets show that the proposed classifiers significantly reduce the number of prototype
منابع مشابه
Users Guide to the Most Similar Neighbor Imputation Program
Partially inventoried planning area Planning area fully populated with inventory information using global data and estimates of ground data based on "most similar neighbors." { You may order additional copies of this publication by sending your mailing information in label form through one of the following media. Please specify the publication title and number. Abstract ________________________...
متن کاملLazy Classifiers Using P-trees
Lazy classifiers store all of the training samples and do not build a classifier until a new sample needs to be classified. It differs from eager classifiers, such as decision tree induction, which build a general model (such as a decision tree) before receiving new samples. K-nearest neighbor (KNN) classification is a typical lazy classifier. Given a set of training data, a knearest neighbor c...
متن کاملSmall-Area Estimation of County-Level Forest Attributes Using Ground Data and Remote Sensed Auxiliary Information
Small-area estimation (SAE) is a concept that has considerable potential for precise estimation of forest ecosystem attributes in partitioned forest populations. In this study, several estimators were compared as SAE techniques for 12 counties in the northern Oregon Coast range. The estimators that were compared consisted of three indirect estimators, multiple linear regression (MLR), gradient ...
متن کاملFusion of multiple approximate nearest neighbor classifiers for fast and efficient classification
The nearest neighbor classifier (NNC) is a popular non-parametric classifier. It is a simple classifier with no design phase and shows good performance. Important factors affecting the efficiency and performance of NNC are (i) memory required to store the training set, (ii) classification time required to search the nearest neighbor of a given test pattern, and (iii) due to the curse of dimensi...
متن کاملPredicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques
Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computación y Sistemas
دوره 14 شماره
صفحات -
تاریخ انتشار 2010